{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a12bb3ff",
   "metadata": {},
   "source": [
    "## Reinforcement Learning\n",
    "\n",
    "This example displays how to use reinforcement learning (RL) to train a policy to control a simple microgrid. We will train and deploy a simple Discrete Q-Network (DQN) policy on one of the *pymgrid25* benchmark microgrids.\n",
    "\n",
    "Algorithms for reinforcement learning are not built into *pymgrid*, nor are they a dependency. We recommend using one of [RLlib](https://docs.ray.io/en/latest/rllib/index.html) and [garage](https://garage.readthedocs.io/en/latest/); RLlib is better supported and has a wider variety of algorithms but can be less developer-friendly in some scenarios. This example will use garage; the API for RLlib is similar.\n",
    "\n",
    "To install garage, see the [garage documentation](https://garage.readthedocs.io/en/latest/user/installation.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "230beca4",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "from pymgrid.envs import DiscreteMicrogridEnv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "92121e85",
   "metadata": {},
   "source": [
    "### Defining the Environment\n",
    "\n",
    "Defining an RL environment is extremely straightforward. To define an environment on one of the benchmark microgrids, we simply call ``from_scenario`` on our choice of the ``DiscreteMicrogridEnv`` and the ``ContinuousMicrogridEnv``. \n",
    "\n",
    "Here, we will use the discrete environment and train a DQN on it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ca0707cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "env = DiscreteMicrogridEnv.from_scenario(microgrid_number=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1453812d",
   "metadata": {},
   "source": [
    "Environments subclass [pymgrid.Microgrid](../reference/microgrid.rst) and thus have the same attribute and logging functionality:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "a9d10e03",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "LoadModule(time_series=<class 'numpy.ndarray'>, forecaster=OracleForecaster, forecast_horizon=23, forecaster_increase_uncertainty=False, raise_errors=False)\n",
      "\n",
      "RenewableModule(time_series=<class 'numpy.ndarray'>, raise_errors=False, forecaster=OracleForecaster, forecast_horizon=23, forecaster_increase_uncertainty=False, provided_energy_name=renewable_used)\n",
      "\n",
      "UnbalancedEnergyModule(raise_errors=False, loss_load_cost=10, overgeneration_cost=1)\n",
      "\n",
      "BatteryModule(min_capacity=290.40000000000003, max_capacity=1452, max_charge=363, max_discharge=363, efficiency=0.9, battery_cost_cycle=0.02, battery_transition_model=None, init_charge=None, init_soc=0.2, raise_errors=False)\n",
      "\n",
      "GridModule(max_import=1920, max_export=1920)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for module in env.modules.module_list():\n",
    "    print(f'{module}\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "64eda080",
   "metadata": {},
   "source": [
    "### Setting Up the RL Algorithm\n",
    "\n",
    "As we mentioned, we are planning on deploying a simple DQN in this case.\n",
    "\n",
    "For ease of use, we will employ a simple ``LocalSampler`` that does not parallelize sampling. We will also use an ``EpsilonGreedyPolicy`` for exploration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "1d174e6c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from garage.experiment.deterministic import set_seed\n",
    "\n",
    "from garage.np.exploration_policies import EpsilonGreedyPolicy\n",
    "\n",
    "from garage.replay_buffer import PathBuffer\n",
    "\n",
    "from garage.sampler import LocalSampler, RaySampler\n",
    "\n",
    "from garage.torch.algos.dqn import DQN\n",
    "from garage.torch.policies import DiscreteQFArgmaxPolicy\n",
    "from garage.torch.q_functions import DiscreteMLPQFunction\n",
    "\n",
    "from garage.trainer import Trainer"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ace8f2ae",
   "metadata": {},
   "source": [
    "Remainder Coming Soon."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c56b3d25",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
